<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
                "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
  <title>Description of v_vadsohn</title>
  <meta name="keywords" content="v_vadsohn">
  <meta name="description" content="V_VADSOHN implements a voice activity detector [VS,ZO]=(S,FSZ,M,P)">
  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
  <meta name="generator" content="m2html &copy; 2003 Guillaume Flandin">
  <meta name="robots" content="index, follow">
  <link type="text/css" rel="stylesheet" href="../m2html.css">
</head>
<body>
<a name="_top"></a>
<div><a href="../index.html">Home</a> &gt;  <a href="index.html">v_mfiles</a> &gt; v_vadsohn.m</div>

<!--<table width="100%"><tr><td align="left"><a href="../index.html"><img alt="<" border="0" src="../left.png">&nbsp;Master index</a></td>
<td align="right"><a href="index.html">Index for v_mfiles&nbsp;<img alt=">" border="0" src="../right.png"></a></td></tr></table>-->

<h1>v_vadsohn
</h1>

<h2><a name="_name"></a>PURPOSE <a href="#_top"><img alt="^" border="0" src="../up.png"></a></h2>
<div class="box"><strong>V_VADSOHN implements a voice activity detector [VS,ZO]=(S,FSZ,M,P)</strong></div>

<h2><a name="_synopsis"></a>SYNOPSIS <a href="#_top"><img alt="^" border="0" src="../up.png"></a></h2>
<div class="box"><strong>function [vs,zo]=v_vadsohn(si,fsz,m,pp) </strong></div>

<h2><a name="_description"></a>DESCRIPTION <a href="#_top"><img alt="^" border="0" src="../up.png"></a></h2>
<div class="fragment"><pre class="comment">V_VADSOHN implements a voice activity detector [VS,ZO]=(S,FSZ,M,P)

 Inputs:
   si      input speech signal
   fsz     sample frequency in Hz
           Alternatively, the input state from a previous call (see below)
   m       mode [default = 'a']:
               'n'   output frame start/end in samples
               't'   output frame start/end in seconds
               'a'   output activity decision [default]
               'b'   output activity likelihood ratio
               'p'   plot graph [default if no outputs]
   pp      algorithm parameters [optional]

 Outputs:
   vs(:,n)     outputs in the order [t1,t2,a,b] as selected by m options
               if 'n' or 't' is specified in m, vs has one row per frame and
               t1 and t2 give the frame start end end times (in samples or seconds).
               otherwise, vs has one row per sample.
   zo          output state (used as input for a subsequent call).

 The algorithm operation is controlled by a small number of parameters:

        pp.of          % overlap factor = (fft length)/(frame increment) [2]
        pp.ti          % desired output frame increment (10 ms)
        pp.tj          % internal frame increment (10 ms)
        pp.ri          % set to 1 to round tj to the nearest power of 2 samples [0]
        pp.ta          % Time const for smoothing SNR estimate [0.396 seconds]
        pp.gx          % maximum posterior SNR as a power ratio [1000 = 30dB]
        pp.gz          % minimum posterior SNR as a power ratio [0.0001 = -40dB]
        pp.xn          % minimum prior SNR [0]
        pp.pr          % Speech probability threshold [0.7]
        pp.ts          % mean talkspurt length (100 ms)
        pp.tn          % mean silence length (50 ms)
        pp.ne          % noise estimation: 0=min statistics, 1=MMSE [0]

 In addition it is possible to specify parameters for the noise estimation algorithm
 which implements reference [3] from which equation numbers are given in parentheses.
 They are as follows:

        pp.taca      % (11): smoothing time constant for alpha_c [0.0449 seconds]
        pp.tamax     % (3): max smoothing time constant [0.392 seconds]
        pp.taminh    % (3): min smoothing time constant (upper limit) [0.0133 seconds]
        pp.tpfall    % (12): time constant for P to fall [0.064 seconds]
        pp.tbmax     % (20): max smoothing time constant [0.0717 seconds]
        pp.qeqmin    % (23): minimum value of Qeq [2]
        pp.qeqmax    % max value of Qeq per frame [14]
        pp.av        % (23)+13 lines: fudge factor for bc calculation  [2.12]
        pp.td        % time to take minimum over [1.536 seconds]
        pp.nu        % number of subwindows to use [3]
        pp.qith      % Q-inverse thresholds to select maximum noise slope [0.03 0.05 0.06 Inf ]
        pp.nsmdb     % corresponding noise slope thresholds in dB/second   [47 31.4 15.7 4.1]


 If convenient, you can call v_vadsohn in chunks of arbitrary size. Thus the following are equivalent:

                   (a) y=v_vadsohn(s,fs);

                   (b) [y1,z]=v_vadsohn(s(1:1000),fs);
                       [y2,z]=v_vadsohn(s(1001:2000),z);
                       y3=v_vadsohn(s(2001:end),z);
                       y=[y1; y2; y3];

 Note that in all cases the number of output samples will be less than the number of input samples if
 the input length is not an exact number of frames. In addition, if two output arguments
 are specified, the last partial frame samples will be retained for overlap adding
 with the output from the next call to v_specsub().

 Refs:
    [1] J. Sohn, N. S. Kim, and W. Sung.
        A statistical model-based voice activity detection.
        IEEE Signal Processing Lett., 6 (1): 1-3, 1999. doi: 10.1109/97.736233.
    [2] Ephraim, Y. &amp; Malah, D.
        Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator
        IEEE Trans Acoustics Speech and Signal Processing, 32(6):1109-1121, Dec 1984
    [3] Rainer Martin.
        Noise power spectral density estimation based on optimal smoothing and minimum statistics.
        IEEE Trans. Speech and Audio Processing, 9(5):504-512, July 2001.</pre></div>

<!-- crossreference -->
<h2><a name="_cross"></a>CROSS-REFERENCE INFORMATION <a href="#_top"><img alt="^" border="0" src="../up.png"></a></h2>
This function calls:
<ul style="list-style-image:url(../matlabicon.gif)">
<li><a href="v_enframe.html" class="code" title="function [f,t,w]=v_enframe(x,win,hop,m,fs)">v_enframe</a>	V_ENFRAME split signal up into (overlapping) frames: one per row. [F,T]=(X,WIN,HOP)</li><li><a href="v_estnoiseg.html" class="code" title="function [x,zo]=v_estnoiseg(yf,tz,pp)">v_estnoiseg</a>	V_ESTNOISEG - estimate MMSE noise spectrum [x,zo]=(yf,tz,pp)</li><li><a href="v_estnoisem.html" class="code" title="function [x,zo,xs]=v_estnoisem(yf,tz,pp)">v_estnoisem</a>	V_ESTNOISEM - estimate noise spectrum using minimum statistics</li><li><a href="v_rfft.html" class="code" title="function y=v_rfft(x,n,d)">v_rfft</a>	V_RFFT     Calculate the DFT of real data Y=(X,N,D)</li></ul>
This function is called by:
<ul style="list-style-image:url(../matlabicon.gif)">
<li><a href="v_snrseg.html" class="code" title="function [seg,glo]=v_snrseg(s,r,fs,m,tf)">v_snrseg</a>	V_SNRSEG Measure segmental and global SNR [SEG,GLO]=(S,R,FS,M,TF)</li></ul>
<!-- crossreference -->


<h2><a name="_source"></a>SOURCE CODE <a href="#_top"><img alt="^" border="0" src="../up.png"></a></h2>
<div class="fragment"><pre>0001 <a name="_sub0" href="#_subfunctions" class="code">function [vs,zo]=v_vadsohn(si,fsz,m,pp)</a>
0002 <span class="comment">%V_VADSOHN implements a voice activity detector [VS,ZO]=(S,FSZ,M,P)</span>
0003 <span class="comment">%</span>
0004 <span class="comment">% Inputs:</span>
0005 <span class="comment">%   si      input speech signal</span>
0006 <span class="comment">%   fsz     sample frequency in Hz</span>
0007 <span class="comment">%           Alternatively, the input state from a previous call (see below)</span>
0008 <span class="comment">%   m       mode [default = 'a']:</span>
0009 <span class="comment">%               'n'   output frame start/end in samples</span>
0010 <span class="comment">%               't'   output frame start/end in seconds</span>
0011 <span class="comment">%               'a'   output activity decision [default]</span>
0012 <span class="comment">%               'b'   output activity likelihood ratio</span>
0013 <span class="comment">%               'p'   plot graph [default if no outputs]</span>
0014 <span class="comment">%   pp      algorithm parameters [optional]</span>
0015 <span class="comment">%</span>
0016 <span class="comment">% Outputs:</span>
0017 <span class="comment">%   vs(:,n)     outputs in the order [t1,t2,a,b] as selected by m options</span>
0018 <span class="comment">%               if 'n' or 't' is specified in m, vs has one row per frame and</span>
0019 <span class="comment">%               t1 and t2 give the frame start end end times (in samples or seconds).</span>
0020 <span class="comment">%               otherwise, vs has one row per sample.</span>
0021 <span class="comment">%   zo          output state (used as input for a subsequent call).</span>
0022 <span class="comment">%</span>
0023 <span class="comment">% The algorithm operation is controlled by a small number of parameters:</span>
0024 <span class="comment">%</span>
0025 <span class="comment">%        pp.of          % overlap factor = (fft length)/(frame increment) [2]</span>
0026 <span class="comment">%        pp.ti          % desired output frame increment (10 ms)</span>
0027 <span class="comment">%        pp.tj          % internal frame increment (10 ms)</span>
0028 <span class="comment">%        pp.ri          % set to 1 to round tj to the nearest power of 2 samples [0]</span>
0029 <span class="comment">%        pp.ta          % Time const for smoothing SNR estimate [0.396 seconds]</span>
0030 <span class="comment">%        pp.gx          % maximum posterior SNR as a power ratio [1000 = 30dB]</span>
0031 <span class="comment">%        pp.gz          % minimum posterior SNR as a power ratio [0.0001 = -40dB]</span>
0032 <span class="comment">%        pp.xn          % minimum prior SNR [0]</span>
0033 <span class="comment">%        pp.pr          % Speech probability threshold [0.7]</span>
0034 <span class="comment">%        pp.ts          % mean talkspurt length (100 ms)</span>
0035 <span class="comment">%        pp.tn          % mean silence length (50 ms)</span>
0036 <span class="comment">%        pp.ne          % noise estimation: 0=min statistics, 1=MMSE [0]</span>
0037 <span class="comment">%</span>
0038 <span class="comment">% In addition it is possible to specify parameters for the noise estimation algorithm</span>
0039 <span class="comment">% which implements reference [3] from which equation numbers are given in parentheses.</span>
0040 <span class="comment">% They are as follows:</span>
0041 <span class="comment">%</span>
0042 <span class="comment">%        pp.taca      % (11): smoothing time constant for alpha_c [0.0449 seconds]</span>
0043 <span class="comment">%        pp.tamax     % (3): max smoothing time constant [0.392 seconds]</span>
0044 <span class="comment">%        pp.taminh    % (3): min smoothing time constant (upper limit) [0.0133 seconds]</span>
0045 <span class="comment">%        pp.tpfall    % (12): time constant for P to fall [0.064 seconds]</span>
0046 <span class="comment">%        pp.tbmax     % (20): max smoothing time constant [0.0717 seconds]</span>
0047 <span class="comment">%        pp.qeqmin    % (23): minimum value of Qeq [2]</span>
0048 <span class="comment">%        pp.qeqmax    % max value of Qeq per frame [14]</span>
0049 <span class="comment">%        pp.av        % (23)+13 lines: fudge factor for bc calculation  [2.12]</span>
0050 <span class="comment">%        pp.td        % time to take minimum over [1.536 seconds]</span>
0051 <span class="comment">%        pp.nu        % number of subwindows to use [3]</span>
0052 <span class="comment">%        pp.qith      % Q-inverse thresholds to select maximum noise slope [0.03 0.05 0.06 Inf ]</span>
0053 <span class="comment">%        pp.nsmdb     % corresponding noise slope thresholds in dB/second   [47 31.4 15.7 4.1]</span>
0054 <span class="comment">%</span>
0055 <span class="comment">%</span>
0056 <span class="comment">% If convenient, you can call v_vadsohn in chunks of arbitrary size. Thus the following are equivalent:</span>
0057 <span class="comment">%</span>
0058 <span class="comment">%                   (a) y=v_vadsohn(s,fs);</span>
0059 <span class="comment">%</span>
0060 <span class="comment">%                   (b) [y1,z]=v_vadsohn(s(1:1000),fs);</span>
0061 <span class="comment">%                       [y2,z]=v_vadsohn(s(1001:2000),z);</span>
0062 <span class="comment">%                       y3=v_vadsohn(s(2001:end),z);</span>
0063 <span class="comment">%                       y=[y1; y2; y3];</span>
0064 <span class="comment">%</span>
0065 <span class="comment">% Note that in all cases the number of output samples will be less than the number of input samples if</span>
0066 <span class="comment">% the input length is not an exact number of frames. In addition, if two output arguments</span>
0067 <span class="comment">% are specified, the last partial frame samples will be retained for overlap adding</span>
0068 <span class="comment">% with the output from the next call to v_specsub().</span>
0069 <span class="comment">%</span>
0070 <span class="comment">% Refs:</span>
0071 <span class="comment">%    [1] J. Sohn, N. S. Kim, and W. Sung.</span>
0072 <span class="comment">%        A statistical model-based voice activity detection.</span>
0073 <span class="comment">%        IEEE Signal Processing Lett., 6 (1): 1-3, 1999. doi: 10.1109/97.736233.</span>
0074 <span class="comment">%    [2] Ephraim, Y. &amp; Malah, D.</span>
0075 <span class="comment">%        Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator</span>
0076 <span class="comment">%        IEEE Trans Acoustics Speech and Signal Processing, 32(6):1109-1121, Dec 1984</span>
0077 <span class="comment">%    [3] Rainer Martin.</span>
0078 <span class="comment">%        Noise power spectral density estimation based on optimal smoothing and minimum statistics.</span>
0079 <span class="comment">%        IEEE Trans. Speech and Audio Processing, 9(5):504-512, July 2001.</span>
0080 
0081 <span class="comment">%      Copyright (C) Mike Brookes 2004</span>
0082 <span class="comment">%      Version: $Id: v_vadsohn.m 10865 2018-09-21 17:22:45Z dmb $</span>
0083 <span class="comment">%</span>
0084 <span class="comment">%   VOICEBOX is a MATLAB toolbox for speech processing.</span>
0085 <span class="comment">%   Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html</span>
0086 <span class="comment">%</span>
0087 <span class="comment">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span>
0088 <span class="comment">%   This program is free software; you can redistribute it and/or modify</span>
0089 <span class="comment">%   it under the terms of the GNU General Public License as published by</span>
0090 <span class="comment">%   the Free Software Foundation; either version 2 of the License, or</span>
0091 <span class="comment">%   (at your option) any later version.</span>
0092 <span class="comment">%</span>
0093 <span class="comment">%   This program is distributed in the hope that it will be useful,</span>
0094 <span class="comment">%   but WITHOUT ANY WARRANTY; without even the implied warranty of</span>
0095 <span class="comment">%   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the</span>
0096 <span class="comment">%   GNU General Public License for more details.</span>
0097 <span class="comment">%</span>
0098 <span class="comment">%   You can obtain a copy of the GNU General Public License from</span>
0099 <span class="comment">%   http://www.gnu.org/copyleft/gpl.html or by writing to</span>
0100 <span class="comment">%   Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA.</span>
0101 <span class="comment">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span>
0102 <span class="keyword">if</span> nargin&lt;3 || isempty(m)
0103     m=<span class="string">''</span>;
0104 <span class="keyword">end</span>
0105 <span class="keyword">if</span> isstruct(fsz)
0106     fs=fsz.fs;
0107     qq=fsz.qq;
0108     qp=fsz.qp;
0109     ze=fsz.ze;
0110     s=zeros(length(fsz.si)+length(si(:)),1); <span class="comment">% allocate space for speech</span>
0111     s(1:length(fsz.si))=fsz.si;
0112     s(length(fsz.si)+1:end)=si(:);
0113 <span class="keyword">else</span>
0114     fs=fsz;     <span class="comment">% sample frequency</span>
0115     s=si(:);
0116     <span class="comment">% default algorithm constants</span>
0117 
0118     qq.of=2;        <span class="comment">% overlap factor = (fft length)/(frame increment)</span>
0119     qq.pr=0.7;      <span class="comment">% Speech probability threshold</span>
0120     qq.ts=0.1;      <span class="comment">% mean talkspurt length (100 ms)</span>
0121     qq.tn=0.05;     <span class="comment">% mean silence length (50 ms)</span>
0122     qq.ti=10e-3;    <span class="comment">% desired output frame increment (10 ms)</span>
0123     qq.tj=10e-3;    <span class="comment">% internal frame increment (10 ms)</span>
0124     qq.ri=0;        <span class="comment">% round ni to the nearest power of 2</span>
0125     qq.ta=0.396;    <span class="comment">% Time const for smoothing SNR estimate = -tinc/log(0.98) from [2]</span>
0126     qq.gx=1000;     <span class="comment">% maximum posterior SNR = 30dB</span>
0127     qq.gz=1e-4;     <span class="comment">% minimum posterior SNR = -40dB</span>
0128     qq.xn=0;        <span class="comment">% minimum prior SNR = -Inf dB</span>
0129     qq.ne=0;          <span class="comment">% noise estimation: 0=min statistics, 1=MMSE [0]</span>
0130     <span class="keyword">if</span> nargin&gt;=4 &amp;&amp; ~isempty(pp)
0131         qp=pp;      <span class="comment">% save for v_estnoisem call</span>
0132         qqn=fieldnames(qq);
0133         <span class="keyword">for</span> i=1:length(qqn)
0134             <span class="keyword">if</span> isfield(pp,qqn{i})
0135                 qq.(qqn{i})=pp.(qqn{i});
0136             <span class="keyword">end</span>
0137         <span class="keyword">end</span>
0138     <span class="keyword">else</span>
0139         qp=struct;  <span class="comment">% make an empty structure</span>
0140     <span class="keyword">end</span>
0141 <span class="keyword">end</span>
0142 <span class="comment">% derived algorithm constants</span>
0143 qq.tj=min([qq.tj 0.5*qq.ts 0.5*qq.tn]);     <span class="comment">% at least two frames per talk/silence spurt</span>
0144 nj=max(round(qq.ti/qq.tj),1);               <span class="comment">% number of internal frames per output frame</span>
0145 <span class="keyword">if</span> qq.ri
0146     ni=pow2(nextpow2(qq.ti*fs*sqrt(0.5)/nj)); <span class="comment">% internal frame increment in samples</span>
0147 <span class="keyword">else</span>
0148     ni=round(qq.ti*fs/nj);    <span class="comment">% internal frame increment in samples</span>
0149 <span class="keyword">end</span>
0150 tinc=ni/fs;             <span class="comment">% true frame increment time</span>
0151 a=exp(-tinc/qq.ta);     <span class="comment">% SNR smoothing coefficient</span>
0152 gmax=qq.gx;             <span class="comment">% max posterior SNR = 30 dB</span>
0153 kk=sqrt(2*pi);          <span class="comment">% sqrt(8)*Gamma(1.5) - required constant</span>
0154 xn=qq.xn;                <span class="comment">% floor for prior SNR, xi</span>
0155 gz=qq.gz;               <span class="comment">% floor for posterior SNR</span>
0156 a01=tinc/qq.tn;         <span class="comment">% a01=P(silence-&gt;speech)</span>
0157 a00=1-a01;              <span class="comment">% a00=P(silence-&gt;silence)</span>
0158 a10=tinc/qq.ts;         <span class="comment">% a10=P(speech-&gt;silence)</span>
0159 a11=1-a10;              <span class="comment">% a11=P(speech-&gt;speech)</span>
0160 b11=a11/a10;
0161 b01=a01/a00;
0162 b00=a01-a00*a11/a10;
0163 b10=a11-a10*a01/a00;
0164 prat=a10/a01;       <span class="comment">% P(silence)/P(speech)</span>
0165 lprat=log(prat);
0166 <span class="comment">% calculate power spectrum in frames</span>
0167 
0168 no=round(qq.of);                                   <span class="comment">% integer overlap factor</span>
0169 nf=ni*no;           <span class="comment">% fft length</span>
0170 nd=floor(ni*(no-1)/2); <span class="comment">% output delay relative to start of frame</span>
0171 w=hamming(nf+1)'; w(end)=[];  <span class="comment">% for now always use hamming window</span>
0172 w=w/sqrt(sum(w(1:ni:nf).^2));          <span class="comment">% normalize to give overall gain of 1</span>
0173 ns=length(s);
0174 y=<a href="v_enframe.html" class="code" title="function [f,t,w]=v_enframe(x,win,hop,m,fs)">v_enframe</a>(s,w,ni);
0175 yf=<a href="v_rfft.html" class="code" title="function y=v_rfft(x,n,d)">v_rfft</a>(y,nf,2);
0176 <span class="keyword">if</span> ~size(yf,1)                                  <span class="comment">% no data frames</span>
0177     vs=[];
0178     nr=0;
0179     nb=0;
0180 <span class="keyword">else</span>
0181     yp=yf.*conj(yf);                <span class="comment">% power spectrum of input speech</span>
0182     [nr,nf2]=size(yp);              <span class="comment">% number of frames</span>
0183     nb=ni*nr;
0184     isz=isstruct(fsz);
0185     <span class="keyword">if</span> isz
0186         <span class="keyword">if</span> qq.ne&gt;0
0187             [dp,ze]=<a href="v_estnoiseg.html" class="code" title="function [x,zo]=v_estnoiseg(yf,tz,pp)">v_estnoiseg</a>(yp,ze);    <span class="comment">% estimate the noise using MMSE</span>
0188         <span class="keyword">else</span>
0189             [dp,ze]=<a href="v_estnoisem.html" class="code" title="function [x,zo,xs]=v_estnoisem(yf,tz,pp)">v_estnoisem</a>(yp,ze);    <span class="comment">% estimate the noise using minimum statistics</span>
0190         <span class="keyword">end</span>
0191         xu=fsz.xu;                  <span class="comment">% previously saved unsmoothed SNR</span>
0192         lggami=fsz.gg;
0193         nv=fsz.nv;
0194     <span class="keyword">else</span>
0195         <span class="keyword">if</span> qq.ne&gt;0
0196             [dp,ze]=<a href="v_estnoiseg.html" class="code" title="function [x,zo]=v_estnoiseg(yf,tz,pp)">v_estnoiseg</a>(yp,tinc,qp);    <span class="comment">% estimate the noise using MMSE</span>
0197         <span class="keyword">else</span>
0198             [dp,ze]=<a href="v_estnoisem.html" class="code" title="function [x,zo,xs]=v_estnoisem(yf,tz,pp)">v_estnoisem</a>(yp,tinc,qp);    <span class="comment">% estimate the noise using minimum statistics</span>
0199         <span class="keyword">end</span>
0200         xu=1;                               <span class="comment">% dummy unsmoothed SNR from previous frame [2](53)++</span>
0201         lggami=0;                            <span class="comment">% initial prob ratio</span>
0202         nv=0;                               <span class="comment">% number of previous speech samples</span>
0203     <span class="keyword">end</span>
0204     gam=max(min(yp./dp,gmax),gz);       <span class="comment">% gamma = posterior SNR [2](10)</span>
0205     prsp=zeros(nr,1);                   <span class="comment">% create space for prob ratio vector</span>
0206     <span class="keyword">for</span> i=1:nr                          <span class="comment">% loop for each frame</span>
0207         gami=gam(i,:);                  <span class="comment">% gamma(i) = a posteriori SNR [2](10)</span>
0208         xi=a*xu+(1-a)*max(gami-1,xn);   <span class="comment">% xi = smoothed a priori SNR [2](53)</span>
0209         xi1=1+xi;
0210         v=0.5*xi.*gami./xi1;            <span class="comment">% note that this is 0.5*vk in [2]</span>
0211         lamk=2*v-log(xi1);              <span class="comment">% log likelihood ratio for each frequency bin [1](3)</span>
0212         lami=sum(lamk(2:nf2))/(nf2-1);  <span class="comment">% mean log LR over frequency omitting DC term [1](4)</span>
0213         <span class="keyword">if</span> lggami&lt;0                     <span class="comment">% avoid overflow in calculating [1](11)</span>
0214             lggami=lprat+lami+log(b11+b00/(a00+a10*exp(lggami)));
0215         <span class="keyword">else</span>
0216             lggami=lprat+lami+log(b01+b10/(a10+a00*exp(-lggami)));
0217         <span class="keyword">end</span>
0218         <span class="keyword">if</span> lggami&lt;0
0219             gg=exp(lggami);
0220             prsp(i)=gg/(1+gg);
0221         <span class="keyword">else</span>
0222             prsp(i)=1/(1+exp(-lggami));
0223         <span class="keyword">end</span>
0224         gi=(0.277+2*v)./gami;           <span class="comment">% accurate to 0.02 dB for v&gt;0.5</span>
0225         mv=v&lt;0.5;
0226         <span class="keyword">if</span> any(mv)                      <span class="comment">% only calculate Bessel functions for v&lt;0.5</span>
0227             vmv=v(mv);
0228             gi(mv)=kk*sqrt(vmv).*((0.5+vmv).*besseli(0,vmv)+vmv.*besseli(1,vmv))./(gami(mv).*exp(vmv)); <span class="comment">% [2](7)</span>
0229         <span class="keyword">end</span>
0230         xu=gami.*gi.^2;                 <span class="comment">% unsmoothed prior SNR % [2](53)</span>
0231     <span class="keyword">end</span>
0232     ncol=any(m==<span class="string">'a'</span>)+any(m==<span class="string">'b'</span>);       <span class="comment">% number of output data columns</span>
0233     <span class="keyword">if</span> ~ncol
0234         m(end+1)=<span class="string">'a'</span>;   <span class="comment">% force at least one output</span>
0235         ncol=1;
0236     <span class="keyword">end</span>
0237     nw=floor(nr/nj);                <span class="comment">% number of output or plotted frames</span>
0238     <span class="keyword">if</span> any(m==<span class="string">'n'</span>) || any(m==<span class="string">'t'</span>)       <span class="comment">% frame-based outputs</span>
0239         vs=zeros(nw,2+ncol);            <span class="comment">% space for outputs</span>
0240         vs(:,1)=nd+1+nv+(0:nw-1)'*ni*nj;   <span class="comment">% starting sample of each frame</span>
0241         vs(:,2)=ni*nj-1+vs(:,1);           <span class="comment">% ending sample of each frame</span>
0242         <span class="keyword">if</span> any(m==<span class="string">'t'</span>)
0243             vs(:,1:2)=vs(:,1:2)/fs;     <span class="comment">% convert to seconds</span>
0244         <span class="keyword">end</span>
0245         <span class="keyword">if</span> any(m==<span class="string">'a'</span>)
0246             vs(:,3)=any(reshape(prsp(1:nw*nj)&gt;qq.pr,nj,[]),1).';
0247         <span class="keyword">end</span>
0248         <span class="keyword">if</span> any(m==<span class="string">'b'</span>)
0249             vs(:,end)=max(reshape(prsp(1:nw*nj),nj,[]),[],1).';
0250         <span class="keyword">end</span>
0251     <span class="keyword">else</span>
0252         na=nd*(1-isz);                  <span class="comment">% include preamble if no input state supplied</span>
0253         nc=(nargout&lt;=1)*(ns-nd-nb);     <span class="comment">% include tail if no output state desired</span>
0254         vs=zeros(na+nb+nc,ncol);
0255         vs(na+(1:nb),ncol)=reshape(repmat(prsp',ni,1),nb,1);
0256         vs(1:na,ncol)=vs(na+1,ncol);        <span class="comment">% replicate start</span>
0257         vs(na+nb+1:<span class="keyword">end</span>,ncol)=vs(na+nb,ncol); <span class="comment">% replicate end</span>
0258         <span class="keyword">if</span> any(m==<span class="string">'a'</span>)
0259             vs(:,1)=vs(:,ncol)&gt;qq.pr;
0260         <span class="keyword">end</span>
0261     <span class="keyword">end</span>
0262 <span class="keyword">end</span>
0263 <span class="keyword">if</span> nargout&gt;1
0264     zo.si=s(nd+nb+1:end);      <span class="comment">% save the tail end of the input speech signal</span>
0265     zo.fs=fs;                       <span class="comment">% save sample frequency</span>
0266     zo.qq=qq;                       <span class="comment">% save local parameters</span>
0267     zo.qp=qp;                       <span class="comment">% save v_estnoisem parameters</span>
0268     zo.ze=ze;                       <span class="comment">% save state of noise estimation</span>
0269     zo.xu=xu;                       <span class="comment">% unsmoothed prior SNR  [2](53)</span>
0270     zo.gg=lggami;                    <span class="comment">% posterior prob ratio: P(speech|s)/P(silence|s) [1](11)</span>
0271     zo.nv=nv+nd+nb;                      <span class="comment">% number of previous speech samples</span>
0272 <span class="keyword">end</span>
0273 <span class="keyword">if</span> (~nargout || any(m==<span class="string">'p'</span>)) &amp;&amp; nr&gt;0
0274     ax=subplot(212);
0275     plot((nv+nd+[0 nr*ni])/fs,[qq.pr qq.pr],<span class="string">'r:'</span>,(nv+nd+ni*nj/2+(0:nw-1)*ni*nj)/fs,max(reshape(prsp(1:nw*nj),nj,[]),[],1).',<span class="string">'b-'</span> );
0276     set(gca,<span class="string">'ylim'</span>,[-0.05 1.05]);
0277     xlabel(<span class="string">'Time (s)'</span>);
0278     ylabel(<span class="string">'Pr(speech)'</span>);
0279     ax(2)=subplot(211);
0280     plot((nv+(1:ns))/fs,s);
0281     ylabel(<span class="string">'Speech'</span>);
0282     title(<span class="string">'Sohn VAD'</span>);
0283     linkaxes(ax,<span class="string">'x'</span>);
0284 <span class="keyword">end</span></pre></div>
<hr><address>Generated by <strong><a href="http://www.artefact.tk/software/matlab/m2html/">m2html</a></strong> &copy; 2003</address>
</body>
</html>